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ABSTRACT 

Relationships between Bayesian ability estimates and 
the parameters of a normal population distribution are derived in the 
context of classical test theory. Analogies are provided for use as 
approximations in work with item response theory (IRT) . The following 
issues are addressed: (1) the relationship between the distribution 
of the latent ability variable in a population and the distribution 
of ability estimates; (2) how to proceed in calculating the Bayesian 
estimate if the population distribution is not known; and (3) how to 
estimate the distributions of specified subpopulat ions when Bayesian 
ability estimates have been calculated in accordance with a common 
population distribution. Exact relationships are derived to address 
these questions in the context of classical test theory, assuming 
normally distributed abilities and errors. The analogues that are 
offered are for a not-uncommon IRT context in which the researcher 
has software to calculate the Bayesian IRT estimates for individuals 
under the assumption of a normal population distribution, but 
possesses neither values of the population parameters nor software 
with which to estimate them. (Contains 1 figure and 11 references.) 
(SLD) 
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Some Formulas for Use with Bayesian Ability Estimates 



Abstract 

Relationships between Bayesian ability estimates and the parameters 
of a normal population distribution are derived in the context of classical test 
theory. Analogues are provided for use as approximations in work with 
item response theory. The following questions addressed: 

♦ What is the relationship between the distribution of the latent ability 
variable in a population, and the distribution of ability estimates? 

♦ Because calculating Bayesian estimates typically requires knowing 
the population distribution, how should one proceed if it is not 
known? 

♦ What if Bayesian ability estimates have been calculated in 
accordance with a common population distribution, but it is later 
desired to estimate the distributions of specified subpopulations? 
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Some Formulas for Use with Bayesian Ability Estimates 

Introduction 

From the time of Truman Kelley (1923), Bayesian ability estimates have often been 
used in educational testing. Reasons for doing so range from Novick's theoretical 
arguments for Bayesian inference in general (e.g., Novick and Jackson, 1974) to a more 
practical desire to obtain finite ability estimates for all examinees in the context of item 
response theory (IRT). This paper provides some formulas for practical work with 
Bayesian ability estimates, focusing on the following questions: 

1 . What is the relationship between the distribution of the latent ability variable in a 
population and the distribution of Bayesian ability estimates? 

2. Because calculating Bayesian estimates typically requires knowing the population 
distribution, how should one proceed if it is not known? 

3 . What if Bayesian ability estimates have been calculated using a common population 
distribution, but it is later desired to estimate the distributions of specified 
subpopulations? 

Exact relationships are derived to address these questions in the context of classical 
test theory, assuming normally distributed abilities and errors. Analogues are offered as 
computing approximations in a not-uncommon IRT context: A researcher has software to 
calculate Bayesian IRT estimates for individuals under the assumption of a normal 
population distribution, but possesses neither values of the population parameters nor 
software with which to estimate them. 

Classical Test Theory 

Background and Notation 

The symbol 6 denotes a real-valued latent proficiency variable, assumed to follow 
a normal distribution in a population of examinees; that is, 

fll/i.o'-NOMJ 2 ). (1) 

Under classical test theory (CTT) one observes the value of the manifest variable x , which 
is the sum of the latent variable and an independent, real-valued error or disturbance term e: 



6 



Formulas for Bayesian Ability Estimates 

Page 2 

x = 0 + e . (2) 
If normality is assumed for the error terms, 

e~N(0,of). 

Equivalently , the conditional distribution of x given 0 can be written as 

xie~N(e,of). 

Together, Equations 1 through 3 imply that 

0\^<?tf~K(^<? + <£\ (5) 

When an individual's x is observed, Equation 4 is interpreted as a likelihood 
function for the unobserved 0, denoted t(G{x). Under the assumptions outlined 
previously, 

*(fl|x) = N(x,oJ), (6) 

a normal distribution with mean x and variance of. The maximum likelihood estimate 

(MLE) of an examinee's 0 , denoted 0, is therefore simply jc in this context, and the 
estimation error variance is of . 

NC/ifff 2 ) is the prior distribution for an examinee's 8 value under CTT. It 
represents what is known about 0 before a test score is observed. Suppose /x, a 2 , and of 

are known. The posterior distribution for an individual's 0 after observing x is obtained by 
Bayes theorem as p(0U,/i, a 2 , of) « /(0jx) p(0l/i, a 2 ). If normality is assumed for e and 

0, then the posterior is also normal: 

0ix,M.<^~N(e,* 2 ), 

with (posterior) variance 

= (i-pK, (7) 

where p is the reliability coefficient, defined by 

' / 



(3) 



(4) 
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P = 



(8) 



and with (posterior) mean 



6 = px +{l-p)n 

a 2 a\ 

__ v _1 C 



(9) 



(see Box and Taio, 1973, pp. 74-75, for a proof). Equations 7 and 9 are familiar as 
Kelley's (1923) formulas. 6 is the Bayes mean, or expectation a posteriori (EAP), 
estimate of 6 for an examinee with observed response x. Because the posterior is normal, 
6 is also the Bayes modal estimate for 0, or the mode of its posterior. 

Question 1: What is the relationship between the distribution of the latent 
ability variable in a population, and the distribution of ability 
estimates? 

Because the bottom line in test theory is usually inference about individual 
examinees, attention has focused on obtaining scores for individuals that are optimal in one 
sense or another. MLEs are consistent and best asymptotically normal estimates of 
individuals* &; Bayesian estimates minimize the average squared difference between 
estimates and true values. A fundamental paradox of teu theory is that the distribution of 
ttese "good 9 estimates of individuals' Osismia good estimate of the 6 distribution (Lord, 
1969; Mislevy, Beaton, Kaplan, and Sheehan, 1992). In the CTT setting described above, 
both MLEs and Bayesian estimates follow normal distributions. Their means are equal to 
the mean of 6 in the population, but their variances are not equal to the variance of 6 : 



For MLEs, 




(10) 



Var(e) = Var(x) = a 2 -f 



(11) 



For Bayesian estimates, 
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E(0) = E[px+(l-p)M] 
= pE(x) +(l-p)E(/i) 
= p/i +(l-p)A* 

= M. (12) 



but 

Var(0) = Var[px +(l-p)/i] 
= p 2 Var(jc) 

^pcr 2 . 



(13) 
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The decomposition of variance implied by Equations 7 and 13 should be noted: the variance 
of 0 can be expressed as the sum of the posterior variance (which is the same for all 
examinees under CTT) and the variance of the Bayes mean estimates: 

Var(0) = E[Var(0ljt)] + Var[E(0lx)] 
= Var(0lx) + Var(0) 
= (\-p)a 2 +P0 2 

= °*- (14) 

From Equation 1 1, the variance of MLEs is an overestimate of the variance of ft 
From Equation 13, the variance of Bayesian estimates is an underestimate. In both cases, 
estimating o 2 from the variance of a large sample of individual estimates requires 
adjustments. With MLEs, the adjustment implied by Equation 1 1 is 

* ' c (15) 
if an estimate of o% is available, or, equivalently, 

*-P*H*) (,6) 
if an estimate of p is used. With Bayes mean estimates, Equation 1 3 implies 

a^VarfSj/p. 

u 
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Question 2: Because calculating Bayesian estimates typically requires 

knowing the population distribution, how should one proceed 
if it is not known? 

Bayesian estimates under the normal-distribntion CTT case require the structural 
parameters H, a 2 , and o^. If these are not known, they can be approximated in familiar 

ways: Equation 10 for an estimate of /i, an internal consistency estimate for p, then 
Equation 16 followed by Equation 8 for estimates of a 2 and o*. This section derives an 

alternative approach that lends itself better to an IRT analogue. The basic idea is first to 
construct Bayesian estimates for & by using provisional values for \i and a 2 , and then to 
employ the mean and variance of the resulting estimates to obtain improved values for fx 
and a 2 . These values can be used in turn to construct improved estimates for individual 
examinees. 

The provisional values for /i and o 2 may be denoted by //* and a* 2 . Assuming 
alto be known, one defines the following quantities: 

a* 2 
P * = a* 2 +o? 

and 

0* = p*x + (l-p*)/z* 

The expected mean and variance of ft* in the population of examinees, denoted 
subsequently as M and S 2 , are derived as follows: 

M = E(0*) 

= E[p*x +(l-p*)/i*] 
= p*E(x) +(l-p*)^* 

= p*/i +(l-p*)/i* (18) 

and 
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S 2 = Var(0*) 

= Var[p*x +(l-p*)/i*] 
= p* 2 Var(x) 

= P* 2 K + 4 (19) 

Given (estimates of) M and S 2 , one can then solve for (J. ando 2 in terms of known 
quantities: 

M = [M-(l-p*)/z*]/p* (20) 

and 

^ = s7p* 2 -^. (21) 

These relationships require the existence of the moments that are involved, but not 
normality. 

Question 3: What if Bayesian ability estimates have been calculated in 
accordance with a common population distribution, but it is 
later desired to estimate the distributions of specified 
subpopulations? 

Bayesian ability estimation can combine examinees' observed scores with 
information from other sources, such as a subpopulation membership. Suppose, for 
example, that the distributions of girls and boys are n(^,(^) and N(n bJ o*) 
respectively — normal, with a common within-group variance. If /x, > fi b , then the Bayes 
estimate for a girl with a given observed score will be higher than that of a boy with the 
same score. This might be the way to bet, but it is not the way to run a fair contest, such as 
awarding benefits to individuals. If Bayesian estimates are to be used at all in such a 
situation, they should be calculated with the same prior distribution for all examinees, so as 
to preserve rank orderings. But if individual Bayesian estimates based on a common prior 
are Circulated for such purposes, it follows from the preceding section that they will yield 
biased estimates of subpopulation characteristics when analyzed as if they were true 0s. 
Specifically, the overall population mean and variance play the role of /i* and a * 2 in the 
preceeding section; the actual mean and variance of a subpopulation of interest correspond 
to \i and a 2 ; and the resulting biased estimates correspond to Af and S 2 , 
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As an illustration, the running example of guis and boys is continued It is 
assumed that both subpopulations are of equal size, and that A denotes the mean difference 
The overall population mean and variance arc 

and 

Although the population is actually the mixture of two normals rather than normal itself, 
Equations 7 through 9 might be employed to approximate the posterior mean and variance 
for each individual boy and girl. The mean of these Bayes mean estimates for girls is 
obtained via Equation 18 as a weighted average of the correct value, // g , and the overall 
mean, /! : 

M | =E(^g|ii) = pAi | +(l--p>t. (22) 

Equation 22 shows that the degree of bias depends on p. An improved estimate of the true 
girls' n ^an can be based on Equation 20: 

Item Response Tl.^ory 

The essential ideas of IRT are that the probabilities of multiple responses from an 
examinee are driven by an unobservable proficiency variable ft, and that responses are 
independent given ft The 2-parameter logistic IRT model for binary (correct/incorrect) test 
items, for example, gives the probability of a correct response to Item j as the following 
function of ft: 

?(x J = l\d t a r b J )^[a J (0-b i )\ (23) 

where *F denotes the logistic distribution function, ^(z) = [1 + exp(z)]" 1 ; a value of 1 for xj 
means "correct" and 0 means "incorrect;" and aj and bj are parameters of Item j, indicating 
its sensitivity and difficulty. It is assumed in this presentation that item parameters are 
known. In practice, of coarse, they must be estimated. The interested reader is referred to 
Tsutakawa and Johnson (1990) for one technique for taking uncertainty about item 
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parameters into account when estimating 0. If the item parameters are estimated accurately, 
however, this source of uncertainty can be ignored. 

Under the usual IRT assumption of conditional, or local, independence, the 
probability of a vector of responses x = (x x , . . . , x n ) to n items is a product of terms over 

items: 

p(xie)=np j (^Q j (0) i -\ 

;=i (24) 
where ■ P(x y = 110) and Q^e) ■ 1 - P.(0) = ?(xj = 010). 

Ability Estimates for Individual Examinees 

After x has been observed, Equation 24 is interpreted as a likelihood function 
t(G[x), and serves as a basis for estimating 6. The maximizing value, again denoted 6, is 
the MLE. For samples of x with fixed 6 and large n, 6 is approximately normally 
distributed: 



0~N(0,o?), 
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where the estimation error variance is approximated by the reciprocal of the information 
function, 1q: 

y P>) 

with P. (6) denoting the second derivative of P } (6) with respect to 6. It should be noted 

that in contrast to the CTT setting, the sampling variance of the MLE depends on the value 
of 0. In practice, estimated standard errors are often obtained by evaluating Equation 26 
with the 6 that corresponds to an examinee's x. Their squares, estimated error variances, 
may be denoted by o^. Large-sample properties offer no guarantee of distributional 

properties of $ when n is small, however, and even 80 items can be "small" in unfavorable 
circumstances: 

• The likelihood functions under the one-, two-, and three-parameter logistic IRT 
models have no finite maxima if all the responses are correct or all are incorrect 

u 
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♦ The likelihood functions under the three-parameter models have no finite maxima 
for many response patterns with few correct responses, in comparison with the sum 
of the lower-asymptote item parameters. 

♦ Even when finite maxima exist under the three-parameter model, likelihood 
functions can be decidely non-normal — often skewed right, sometimes multimodal 
(Yen, Burkett, and Sykes, 1991). 

♦ Even when likelihoods are roughly normal, the value provided by Equation 26 may 
not be a good approximation of the inverse of the sampling variance of 0. 

As in CTT, Bayesian IRT estimates of 0 are obtained via Bayes theorem as 
measures of center tendency in the posterior distribution, namely p(0lx) <* t(fyx) p(0). 
Bock and Mislevy (1982) outlined numerical approximations for Bayes mean estimation in 
the context of IRT. One calculates the values of l(tyx) and p(#) at each point along a grid, 
takes the products at each point, and rescales the results to sum to one. This procedure 
yields a discrete approximation of the (possibly quite non-normal) posterior p(0tx). Its 
mean and variance are obtained by formulas for weighted means and variances, with the 
points in the grid serving as observations and their respective posterior probabilities as 
weights. The resulting Bayes mean estimates and posterior variances, 0 and 0"*, can be 
approximated as accurately as desired by spacing the grid points closely enough, and the 
circumstances described previously that plague maximum likelihood estimation present no 
such problems. The relevant formulas are shown below, with the grid points denoted @ m » 
for m=l,...,M: 



P (ejx) = <*ie>(0 J/£*(*ie>(e,), 
• 5=Xe m p(ejx), 



(27) 
(28) 



and 

^=X(0 m -0) 2 p(©J^). (29) 



If p(0) were normally distributed, as in Equation 1, and if an asymptotic normal 

A 

approximation could be obtained for 0 via Equation 25, an examinee's Bayesian mean 
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estimate and posterior variance could be approximated by revising Equations 7 through 9 as 
follows: 



with 



a 2 



(30) 



(31) 



(32) 



and 



(33) 



The preceeding formulas apply as approximations for those examinees with 
response patterns yielding finite values for 6 andcr^. Were this the case for all response 

patterns in a data set, one could calculate the average error variance, and then apply the 
formulas in the CTT sections to approximate population and subpopulation parameters. 
For examinees infinite MLEs, however, Equations 30 through 33 cannot be applied. 
Because Bayesian estimates can be obtained for all patterns, however, it may be useful to 
use them as the basis for approximating for the population mean and variance. To motivate 
the approximations, direct maximum likelihood estimation of population parameters — that 
is, bypassing the step of estimating individuals* 0s — is first reviewed. 

Estimates for Population Parameters 

The expression Af==(xi,...,xw) may be used to denote the response vectors from a 
sample of N examinees. If 0~p(0|a), where a is the possibly vector-valued parameter of 
the distribution, the maximum likelihood estimate of a is obtained by maximizing the 
marginal likelihood function 



/(Jtla) = njK^^W^«)^ (34) 



O l.'j 
ERLC 



Formulas for Bayesian Ability Estimates 

Page 11 



One obtains the maximum by setting to zero the first derivatives of the natural logarithm of 
Equation 34 with respect to each of the elements of a , and then finding the values that 
solve these resulting likelihood equations (Mislevy, 1984). If p(0|a) is the univariate 
normal density, for example, then ce^Qj^a 2 ). Whether or not normality is assumed for the 
6 distribution, the maximum likelihood estimates of the population mean and variance can 
be written in terms of the posterior means and variances of the individual examinees: 



That is, the MLE of ji is the mean of the Bayesian estimates of the examinees, and the MLE 
of cr 2 is the sum of the posterior variances and the variance of the posterior means — 
provided that they were calculated with the correct mean and variance at the start. The 
results specialize to Equations 12 and 14 in the case of CTT. Mislevy (1984) shows how 
this property of "self-consistency" lies at the heart of estimating a by means of Dempster, 
Laird, and Rubin's (1977) EM algorithm . 

Approximations Based on Bayesian Estimates 

One can begin with an provisional approximation for p(0), which may, but need 
not be, normal. Initial values for the mean and variance may be denoted by /x* and <x* 2 . 
An improved approximation of fx and cr 2 is obtained by modifying the CTT correction 
formulas as follows: 

1 . Obtain Bayes mean estimates and posterior variances, Oi and of, i=l,. . for all 
examinees. 

2 . Calculate M and S 2 , the sample mean and variance of the 0i s . 

3 . Calculate the average of the individual examinees' posterior variances: 



/t = £E(«x i ,o) 




(35) 



and 





(36) 
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(37) 



4. Calculate a psuedo-average error variance, analogous to in the CTT solution: 



(38) 



5 . Calculate a psuedo-average reliability coefficient: 



r* 2 + o? 2 ' 

6. Apply Equations 20 and 21 to obtain improved approximations of fi and cr 2 . 
Analogous formulas can be used to approximate subpopulation means and 
variances when a common mean and variance was used to generate the original set 
of estimates for individuals. 

A Numerical Illustration 

This example is based on the responses of 325 students to a 19-item test. The items 
were open-ended, and the two-parameter logistic model was fit to the data with Mislevy 
and Bock's (1983) BELOG program. The scale was set so that the mean and variance of 
the sample were 0 and 1 respectively. The approximation formulas of the preceeding 
section were employed, starting from values for /i* of -1, 0, and 1, crossed with values 
for a* 2 of .25, 1, and 4. From the resulting improved estimates in each combination, a 
second approximating step was then carried out. The results are shown in Figure 1. 

[ Figure 1 about here ] 

Each panel in Figure 1 contains the following values: 

♦ Provisional estimates at the start of an approximation cycle, /z* and a * 2 . With 
these, Bayesian posterior means and variances were calculated for all examinees 
using BILOG. 

• Intermediate calculations Af, S 2 , p*, and o* 2 , which are functions of /x* and of 2 
and the estimates of provisional posterior means and variances for individual 
examinees based on /x* and o* 2 . 
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♦ The resulting updated estimates ft and a 2 . 

The center panel starts from, and returns to, the MLE values of 0 and 1. The panels 
around the perimeter correspond to initial values for \i of -1, 0, or 1, and for initial values 
for <r* 2 of .25, 1, or 4. The resulting improved estimates were used in turn for a second 
adjustment cycle, summaries of which appear in the panel closer next to the center. 

Although this example is meant to be illustrative rather than comprehensive, some 
tentative observations can be made from the results. In each case, a single adjustment step 
produced an accurate estimate of the mean. Even from the initial approximations farthest 
from the correct value, a single step would have been sufficient The adjustments also 
improved the estimates of the population variance in each case, but not by as much 
(although it may be noted that the results are given in terms of variances rather than 
standard deviations; standard deviations are off by only about 5-percent). Unless initial 
approximations are fairly accurate, it would appear prudent to carry out at least two adjust 
steps in order to obtain a satisfactory approximation of the variance. 
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